Overview
Brought to you by YData
Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 4190 |
| Missing cells | 1620 |
| Missing cells (%) | 2.0% |
| Duplicate rows | 78 |
| Duplicate rows (%) | 1.9% |
| Total size in memory | 654.7 KiB |
| Average record size in memory | 160.0 B |
Variable types
| Numeric | 19 |
|---|
| Dataset has 78 (1.9%) duplicate rows | Duplicates |
GOODS_DESCRIPTION_len_chars_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_mean and 11 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_mean is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 7 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_median is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 4 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_min is highly overall correlated with GOODS_DESCRIPTION_len_chars_sum and 4 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_std is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 5 other fields | High correlation |
GOODS_DESCRIPTION_len_chars_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 11 other fields | High correlation |
GOODS_DESCRIPTION_len_words_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 11 other fields | High correlation |
GOODS_DESCRIPTION_len_words_mean is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 8 other fields | High correlation |
GOODS_DESCRIPTION_len_words_median is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 4 other fields | High correlation |
GOODS_DESCRIPTION_len_words_min is highly overall correlated with GOODS_DESCRIPTION_len_chars_min and 5 other fields | High correlation |
GOODS_DESCRIPTION_len_words_std is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 6 other fields | High correlation |
GOODS_DESCRIPTION_len_words_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 10 other fields | High correlation |
HS06_count is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 8 other fields | High correlation |
subtokenization_indicator_max is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 8 other fields | High correlation |
subtokenization_indicator_mean is highly overall correlated with subtokenization_indicator_max and 2 other fields | High correlation |
subtokenization_indicator_median is highly overall correlated with subtokenization_indicator_mean | High correlation |
subtokenization_indicator_min is highly overall correlated with GOODS_DESCRIPTION_len_words_sum and 1 other fields | High correlation |
subtokenization_indicator_std is highly overall correlated with subtokenization_indicator_max and 1 other fields | High correlation |
subtokenization_indicator_sum is highly overall correlated with GOODS_DESCRIPTION_len_chars_max and 7 other fields | High correlation |
GOODS_DESCRIPTION_len_words_std has 540 (12.9%) missing values | Missing |
GOODS_DESCRIPTION_len_chars_std has 540 (12.9%) missing values | Missing |
subtokenization_indicator_std has 540 (12.9%) missing values | Missing |
GOODS_DESCRIPTION_len_words_std has 100 (2.4%) zeros | Zeros |
subtokenization_indicator_std has 106 (2.5%) zeros | Zeros |
Reproduction
| Analysis started | 2025-05-15 17:56:30.090803 |
|---|---|
| Analysis finished | 2025-05-15 17:59:08.566354 |
| Duration | 2 minutes and 38.48 seconds |
| Software version | ydata-profiling vv4.12.1 |
| Download configuration | config.json |
Variables
HS06_count
Real number (ℝ)
High correlation 
| Distinct | 430 |
|---|---|
| Distinct (%) | 10.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 63.909308 |
| Minimum | 1 |
|---|---|
| Maximum | 3869 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 13 |
| Q3 | 49 |
| 95-th percentile | 274 |
| Maximum | 3869 |
| Range | 3868 |
| Interquartile range (IQR) | 46 |
Descriptive statistics
| Standard deviation | 185.02558 |
|---|---|
| Coefficient of variation (CV) | 2.8951273 |
| Kurtosis | 105.46571 |
| Mean | 63.909308 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 8.4442706 |
| Sum | 267780 |
| Variance | 34234.467 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 540 | 12.9% |
| 2 | 319 | 7.6% |
| 3 | 233 | 5.6% |
| 4 | 176 | 4.2% |
| 5 | 158 | 3.8% |
| 7 | 137 | 3.3% |
| 6 | 124 | 3.0% |
| 8 | 113 | 2.7% |
| 9 | 101 | 2.4% |
| 13 | 83 | 2.0% |
| Other values (420) | 2206 |
| Value | Count | Frequency (%) |
| 1 | 540 | |
| 2 | 319 | |
| 3 | 233 | |
| 4 | 176 | 4.2% |
| 5 | 158 | 3.8% |
| 6 | 124 | 3.0% |
| 7 | 137 | 3.3% |
| 8 | 113 | 2.7% |
| 9 | 101 | 2.4% |
| 10 | 51 | 1.2% |
| Value | Count | Frequency (%) |
| 3869 | 1 | |
| 2924 | 1 | |
| 2779 | 1 | |
| 2632 | 1 | |
| 2247 | 1 | |
| 2074 | 1 | |
| 1988 | 1 | |
| 1874 | 1 | |
| 1857 | 1 | |
| 1836 | 1 |
GOODS_DESCRIPTION_len_words_sum
Real number (ℝ)
High correlation 
| Distinct | 907 |
|---|---|
| Distinct (%) | 21.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 293.24773 |
| Minimum | 1 |
|---|---|
| Maximum | 22795 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 12 |
| median | 52 |
| Q3 | 206 |
| 95-th percentile | 1246 |
| Maximum | 22795 |
| Range | 22794 |
| Interquartile range (IQR) | 194 |
Descriptive statistics
| Standard deviation | 939.78084 |
|---|---|
| Coefficient of variation (CV) | 3.2047335 |
| Kurtosis | 181.82486 |
| Mean | 293.24773 |
| Median Absolute Deviation (MAD) | 48 |
| Skewness | 10.765924 |
| Sum | 1228708 |
| Variance | 883188.02 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 197 | 4.7% |
| 3 | 139 | 3.3% |
| 4 | 109 | 2.6% |
| 6 | 101 | 2.4% |
| 5 | 98 | 2.3% |
| 1 | 83 | 2.0% |
| 7 | 81 | 1.9% |
| 9 | 65 | 1.6% |
| 8 | 64 | 1.5% |
| 11 | 56 | 1.3% |
| Other values (897) | 3197 |
| Value | Count | Frequency (%) |
| 1 | 83 | |
| 2 | 197 | |
| 3 | 139 | |
| 4 | 109 | |
| 5 | 98 | |
| 6 | 101 | |
| 7 | 81 | |
| 8 | 64 | 1.5% |
| 9 | 65 | 1.6% |
| 10 | 48 | 1.1% |
| Value | Count | Frequency (%) |
| 22795 | 1 | |
| 21639 | 1 | |
| 12896 | 1 | |
| 12530 | 1 | |
| 12241 | 1 | |
| 9676 | 1 | |
| 9353 | 1 | |
| 9042 | 1 | |
| 8818 | 1 | |
| 8680 | 1 |
GOODS_DESCRIPTION_len_words_min
Real number (ℝ)
High correlation 
| Distinct | 17 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8212411 |
| Minimum | 1 |
|---|---|
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 4 |
| Maximum | 19 |
| Range | 18 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.3782325 |
|---|---|
| Coefficient of variation (CV) | 0.75675458 |
| Kurtosis | 33.363193 |
| Mean | 1.8212411 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.4492171 |
| Sum | 7631 |
| Variance | 1.8995248 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 2121 | |
| 2 | 1452 | |
| 3 | 333 | 7.9% |
| 4 | 128 | 3.1% |
| 5 | 54 | 1.3% |
| 6 | 40 | 1.0% |
| 7 | 20 | 0.5% |
| 9 | 11 | 0.3% |
| 8 | 10 | 0.2% |
| 10 | 9 | 0.2% |
| Other values (7) | 12 | 0.3% |
| Value | Count | Frequency (%) |
| 1 | 2121 | |
| 2 | 1452 | |
| 3 | 333 | 7.9% |
| 4 | 128 | 3.1% |
| 5 | 54 | 1.3% |
| 6 | 40 | 1.0% |
| 7 | 20 | 0.5% |
| 8 | 10 | 0.2% |
| 9 | 11 | 0.3% |
| 10 | 9 | 0.2% |
| Value | Count | Frequency (%) |
| 19 | 2 | < 0.1% |
| 18 | 1 | < 0.1% |
| 16 | 2 | < 0.1% |
| 15 | 1 | < 0.1% |
| 14 | 1 | < 0.1% |
| 13 | 2 | < 0.1% |
| 11 | 3 | 0.1% |
| 10 | 9 | |
| 9 | 11 | |
| 8 | 10 |
GOODS_DESCRIPTION_len_words_mean
Real number (ℝ)
High correlation 
| Distinct | 1692 |
|---|---|
| Distinct (%) | 40.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.0534472 |
| Minimum | 1 |
|---|---|
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 4 |
| Q3 | 4.7544714 |
| 95-th percentile | 6.6 |
| Maximum | 19 |
| Range | 18 |
| Interquartile range (IQR) | 1.7544714 |
Descriptive statistics
| Standard deviation | 1.6385532 |
|---|---|
| Coefficient of variation (CV) | 0.40423698 |
| Kurtosis | 10.007139 |
| Mean | 4.0534472 |
| Median Absolute Deviation (MAD) | 0.84680574 |
| Skewness | 1.8703389 |
| Sum | 16983.944 |
| Variance | 2.6848567 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 268 | 6.4% |
| 3 | 237 | 5.7% |
| 4 | 171 | 4.1% |
| 1 | 95 | 2.3% |
| 5 | 89 | 2.1% |
| 2.5 | 75 | 1.8% |
| 3.5 | 61 | 1.5% |
| 6 | 54 | 1.3% |
| 4.5 | 47 | 1.1% |
| 3.666666667 | 41 | 1.0% |
| Other values (1682) | 3052 |
| Value | Count | Frequency (%) |
| 1 | 95 | |
| 1.25 | 1 | < 0.1% |
| 1.333333333 | 8 | 0.2% |
| 1.5 | 38 | 0.9% |
| 1.6 | 2 | < 0.1% |
| 1.666666667 | 14 | 0.3% |
| 1.714285714 | 2 | < 0.1% |
| 1.75 | 5 | 0.1% |
| 1.8 | 5 | 0.1% |
| 1.833333333 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 19 | 2 | |
| 18 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
| 16 | 2 | |
| 15 | 1 | < 0.1% |
| 14 | 1 | < 0.1% |
| 13.75 | 1 | < 0.1% |
| 13 | 4 | |
| 12.98501873 | 1 | < 0.1% |
| 12.25 | 1 | < 0.1% |
GOODS_DESCRIPTION_len_words_median
Real number (ℝ)
High correlation 
| Distinct | 31 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.6504773 |
| Minimum | 1 |
|---|---|
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 3.5 |
| Q3 | 4 |
| 95-th percentile | 6 |
| Maximum | 19 |
| Range | 18 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.6425555 |
|---|---|
| Coefficient of variation (CV) | 0.44995636 |
| Kurtosis | 12.897529 |
| Mean | 3.6504773 |
| Median Absolute Deviation (MAD) | 0.5 |
| Skewness | 2.46108 |
| Sum | 15295.5 |
| Variance | 2.6979886 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 1187 | |
| 4 | 1100 | |
| 2 | 589 | |
| 5 | 370 | 8.8% |
| 3.5 | 169 | 4.0% |
| 2.5 | 151 | 3.6% |
| 6 | 141 | 3.4% |
| 1 | 119 | 2.8% |
| 4.5 | 90 | 2.1% |
| 7 | 70 | 1.7% |
| Other values (21) | 204 | 4.9% |
| Value | Count | Frequency (%) |
| 1 | 119 | 2.8% |
| 1.5 | 43 | 1.0% |
| 2 | 589 | |
| 2.5 | 151 | 3.6% |
| 3 | 1187 | |
| 3.5 | 169 | 4.0% |
| 4 | 1100 | |
| 4.5 | 90 | 2.1% |
| 5 | 370 | 8.8% |
| 5.5 | 29 | 0.7% |
| Value | Count | Frequency (%) |
| 19 | 2 | < 0.1% |
| 18 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
| 16 | 3 | |
| 14 | 2 | < 0.1% |
| 13.5 | 1 | < 0.1% |
| 13 | 7 | |
| 12.5 | 1 | < 0.1% |
| 12 | 2 | < 0.1% |
| 11.5 | 2 | < 0.1% |
GOODS_DESCRIPTION_len_words_max
Real number (ℝ)
High correlation 
| Distinct | 36 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.6 |
| Minimum | 1 |
|---|---|
| Maximum | 41 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 8 |
| Q3 | 13 |
| 95-th percentile | 22 |
| Maximum | 41 |
| Range | 40 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 6.1978904 |
|---|---|
| Coefficient of variation (CV) | 0.64561359 |
| Kurtosis | 0.59564961 |
| Mean | 9.6 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.92257265 |
| Sum | 40224 |
| Variance | 38.413846 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 4 | 312 | 7.4% |
| 5 | 300 | 7.2% |
| 2 | 299 | 7.1% |
| 3 | 287 | 6.8% |
| 6 | 286 | 6.8% |
| 7 | 274 | 6.5% |
| 10 | 259 | 6.2% |
| 8 | 256 | 6.1% |
| 11 | 237 | 5.7% |
| 9 | 236 | 5.6% |
| Other values (26) | 1444 |
| Value | Count | Frequency (%) |
| 1 | 95 | 2.3% |
| 2 | 299 | |
| 3 | 287 | |
| 4 | 312 | |
| 5 | 300 | |
| 6 | 286 | |
| 7 | 274 | |
| 8 | 256 | |
| 9 | 236 | |
| 10 | 259 |
| Value | Count | Frequency (%) |
| 41 | 1 | < 0.1% |
| 37 | 2 | < 0.1% |
| 34 | 1 | < 0.1% |
| 33 | 1 | < 0.1% |
| 32 | 3 | 0.1% |
| 31 | 4 | 0.1% |
| 30 | 7 | |
| 29 | 7 | |
| 28 | 14 | |
| 27 | 11 |
GOODS_DESCRIPTION_len_words_std
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 2713 |
|---|---|
| Distinct (%) | 74.3% |
| Missing | 540 |
| Missing (%) | 12.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.2447975 |
| Minimum | 0 |
|---|---|
| Maximum | 10.843585 |
| Zeros | 100 |
| Zeros (%) | 2.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.57735027 |
| Q1 | 1.4142136 |
| median | 2.1247171 |
| Q3 | 2.8739789 |
| 95-th percentile | 4.3664968 |
| Maximum | 10.843585 |
| Range | 10.843585 |
| Interquartile range (IQR) | 1.4597653 |
Descriptive statistics
| Standard deviation | 1.2246282 |
|---|---|
| Coefficient of variation (CV) | 0.54554062 |
| Kurtosis | 3.9982845 |
| Mean | 2.2447975 |
| Median Absolute Deviation (MAD) | 0.71050355 |
| Skewness | 1.2127068 |
| Sum | 8193.5108 |
| Variance | 1.4997142 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.7071067812 | 117 | 2.8% |
| 0 | 100 | 2.4% |
| 1.414213562 | 66 | 1.6% |
| 0.5773502692 | 40 | 1.0% |
| 2.121320344 | 39 | 0.9% |
| 1 | 36 | 0.9% |
| 0.5773502692 | 25 | 0.6% |
| 1.154700538 | 22 | 0.5% |
| 0.9574271078 | 17 | 0.4% |
| 2.828427125 | 16 | 0.4% |
| Other values (2703) | 3172 | |
| (Missing) | 540 | 12.9% |
| Value | Count | Frequency (%) |
| 0 | 100 | |
| 0.2717464882 | 1 | < 0.1% |
| 0.3755338081 | 1 | < 0.1% |
| 0.377964473 | 1 | < 0.1% |
| 0.377964473 | 1 | < 0.1% |
| 0.4082482905 | 2 | < 0.1% |
| 0.4082482905 | 1 | < 0.1% |
| 0.4409585518 | 1 | < 0.1% |
| 0.4472135955 | 1 | < 0.1% |
| 0.4472135955 | 5 | 0.1% |
| Value | Count | Frequency (%) |
| 10.84358489 | 1 | |
| 10.44030651 | 1 | |
| 9.812528435 | 1 | |
| 9.192388155 | 2 | |
| 8.986100378 | 1 | |
| 8.845903006 | 1 | |
| 8.485281374 | 1 | |
| 8.354615002 | 1 | |
| 8.185352772 | 1 | |
| 8.082903769 | 1 |
GOODS_DESCRIPTION_len_chars_sum
Real number (ℝ)
High correlation 
| Distinct | 1888 |
|---|---|
| Distinct (%) | 45.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1867.648 |
| Minimum | 3 |
|---|---|
| Maximum | 167738 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 14 |
| Q1 | 75 |
| median | 334 |
| Q3 | 1324.75 |
| 95-th percentile | 7973.2 |
| Maximum | 167738 |
| Range | 167735 |
| Interquartile range (IQR) | 1249.75 |
Descriptive statistics
| Standard deviation | 6206.2527 |
|---|---|
| Coefficient of variation (CV) | 3.3230313 |
| Kurtosis | 247.35778 |
| Mean | 1867.648 |
| Median Absolute Deviation (MAD) | 306 |
| Skewness | 12.426205 |
| Sum | 7825445 |
| Variance | 38517572 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 23 | 34 | 0.8% |
| 13 | 32 | 0.8% |
| 14 | 31 | 0.7% |
| 11 | 31 | 0.7% |
| 12 | 28 | 0.7% |
| 10 | 26 | 0.6% |
| 19 | 25 | 0.6% |
| 25 | 24 | 0.6% |
| 26 | 23 | 0.5% |
| 17 | 22 | 0.5% |
| Other values (1878) | 3914 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 4 | 16 | |
| 5 | 12 | 0.3% |
| 6 | 11 | 0.3% |
| 7 | 13 | |
| 8 | 13 | |
| 9 | 19 | |
| 10 | 26 | |
| 11 | 31 | |
| 12 | 28 |
| Value | Count | Frequency (%) |
| 167738 | 1 | |
| 156074 | 1 | |
| 82898 | 1 | |
| 82038 | 1 | |
| 73835 | 1 | |
| 64804 | 1 | |
| 64331 | 1 | |
| 59853 | 1 | |
| 55286 | 1 | |
| 53641 | 1 |
GOODS_DESCRIPTION_len_chars_min
Real number (ℝ)
High correlation 
| Distinct | 72 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.419093 |
| Minimum | 2 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 6 |
| median | 9 |
| Q3 | 13 |
| 95-th percentile | 26 |
| Maximum | 150 |
| Range | 148 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 9.3133826 |
|---|---|
| Coefficient of variation (CV) | 0.8155974 |
| Kurtosis | 32.335824 |
| Mean | 11.419093 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 4.2796572 |
| Sum | 47846 |
| Variance | 86.739096 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6 | 417 | 10.0% |
| 7 | 355 | 8.5% |
| 8 | 341 | 8.1% |
| 5 | 334 | 8.0% |
| 9 | 333 | 7.9% |
| 10 | 312 | 7.4% |
| 11 | 274 | 6.5% |
| 4 | 264 | 6.3% |
| 12 | 224 | 5.3% |
| 13 | 178 | 4.2% |
| Other values (62) | 1158 |
| Value | Count | Frequency (%) |
| 2 | 15 | 0.4% |
| 3 | 148 | 3.5% |
| 4 | 264 | |
| 5 | 334 | |
| 6 | 417 | |
| 7 | 355 | |
| 8 | 341 | |
| 9 | 333 | |
| 10 | 312 | |
| 11 | 274 |
| Value | Count | Frequency (%) |
| 150 | 1 | |
| 102 | 2 | |
| 100 | 1 | |
| 99 | 2 | |
| 97 | 1 | |
| 88 | 1 | |
| 84 | 1 | |
| 83 | 1 | |
| 80 | 1 | |
| 78 | 1 |
GOODS_DESCRIPTION_len_chars_mean
Real number (ℝ)
High correlation 
| Distinct | 2436 |
|---|---|
| Distinct (%) | 58.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26.040292 |
| Minimum | 3 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 19.930159 |
| median | 25.10156 |
| Q3 | 30.746164 |
| 95-th percentile | 43.251026 |
| Maximum | 150 |
| Range | 147 |
| Interquartile range (IQR) | 10.816006 |
Descriptive statistics
| Standard deviation | 10.620201 |
|---|---|
| Coefficient of variation (CV) | 0.40783726 |
| Kurtosis | 10.939805 |
| Mean | 26.040292 |
| Median Absolute Deviation (MAD) | 5.3984396 |
| Skewness | 1.9069107 |
| Sum | 109108.82 |
| Variance | 112.78868 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 13 | 47 | 1.1% |
| 15 | 41 | 1.0% |
| 14 | 41 | 1.0% |
| 19 | 40 | 1.0% |
| 24 | 38 | 0.9% |
| 10 | 37 | 0.9% |
| 11 | 37 | 0.9% |
| 21 | 37 | 0.9% |
| 23 | 36 | 0.9% |
| 25 | 36 | 0.9% |
| Other values (2426) | 3800 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 4 | 16 | |
| 4.5 | 4 | 0.1% |
| 5 | 13 | |
| 5.666666667 | 1 | < 0.1% |
| 6 | 12 | |
| 6.5 | 2 | < 0.1% |
| 7 | 18 | |
| 7.333333333 | 2 | < 0.1% |
| 7.5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 150 | 1 | |
| 104 | 1 | |
| 103.25 | 1 | |
| 102 | 1 | |
| 100 | 1 | |
| 99 | 2 | |
| 97 | 1 | |
| 88 | 1 | |
| 86.68913858 | 1 | |
| 86.66666667 | 1 |
GOODS_DESCRIPTION_len_chars_median
Real number (ℝ)
High correlation 
| Distinct | 149 |
|---|---|
| Distinct (%) | 3.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 23.718138 |
| Minimum | 3 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 11 |
| Q1 | 18 |
| median | 22.5 |
| Q3 | 27.5 |
| 95-th percentile | 41 |
| Maximum | 150 |
| Range | 147 |
| Interquartile range (IQR) | 9.5 |
Descriptive statistics
| Standard deviation | 10.587122 |
|---|---|
| Coefficient of variation (CV) | 0.4463724 |
| Kurtosis | 14.599156 |
| Mean | 23.718138 |
| Median Absolute Deviation (MAD) | 4.5 |
| Skewness | 2.569964 |
| Sum | 99379 |
| Variance | 112.08716 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 22 | 198 | 4.7% |
| 24 | 196 | 4.7% |
| 20 | 195 | 4.7% |
| 21 | 191 | 4.6% |
| 23 | 186 | 4.4% |
| 25 | 170 | 4.1% |
| 19 | 169 | 4.0% |
| 18 | 158 | 3.8% |
| 26 | 158 | 3.8% |
| 27 | 152 | 3.6% |
| Other values (139) | 2417 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 4 | 17 | |
| 4.5 | 4 | 0.1% |
| 5 | 14 | |
| 5.5 | 1 | < 0.1% |
| 6 | 16 | |
| 6.5 | 2 | < 0.1% |
| 7 | 21 | |
| 7.5 | 1 | < 0.1% |
| 8 | 17 |
| Value | Count | Frequency (%) |
| 150 | 1 | |
| 104 | 1 | |
| 102 | 1 | |
| 100 | 1 | |
| 99 | 2 | |
| 97 | 2 | |
| 95.5 | 1 | |
| 93.5 | 1 | |
| 93 | 1 | |
| 90 | 1 |
GOODS_DESCRIPTION_len_chars_max
Real number (ℝ)
High correlation 
| Distinct | 148 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 60.268974 |
| Minimum | 3 |
|---|---|
| Maximum | 150 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 12 |
| Q1 | 30 |
| median | 53 |
| Q3 | 84 |
| 95-th percentile | 143 |
| Maximum | 150 |
| Range | 147 |
| Interquartile range (IQR) | 54 |
Descriptive statistics
| Standard deviation | 37.391868 |
|---|---|
| Coefficient of variation (CV) | 0.62041654 |
| Kurtosis | -0.23455498 |
| Mean | 60.268974 |
| Median Absolute Deviation (MAD) | 27 |
| Skewness | 0.71280785 |
| Sum | 252527 |
| Variance | 1398.1518 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 150 | 134 | 3.2% |
| 100 | 114 | 2.7% |
| 80 | 83 | 2.0% |
| 26 | 63 | 1.5% |
| 32 | 63 | 1.5% |
| 46 | 62 | 1.5% |
| 30 | 60 | 1.4% |
| 23 | 59 | 1.4% |
| 31 | 55 | 1.3% |
| 33 | 53 | 1.3% |
| Other values (138) | 3444 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 4 | 16 | 0.4% |
| 5 | 16 | 0.4% |
| 6 | 12 | 0.3% |
| 7 | 16 | 0.4% |
| 8 | 16 | 0.4% |
| 9 | 17 | 0.4% |
| 10 | 38 | |
| 11 | 40 | |
| 12 | 46 |
| Value | Count | Frequency (%) |
| 150 | 134 | |
| 149 | 35 | 0.8% |
| 148 | 9 | 0.2% |
| 147 | 11 | 0.3% |
| 146 | 1 | < 0.1% |
| 145 | 6 | 0.1% |
| 144 | 7 | 0.2% |
| 143 | 11 | 0.3% |
| 142 | 6 | 0.1% |
| 141 | 5 | 0.1% |
GOODS_DESCRIPTION_len_chars_std
Real number (ℝ)
High correlation  Missing 
| Distinct | 3254 |
|---|---|
| Distinct (%) | 89.2% |
| Missing | 540 |
| Missing (%) | 12.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.086485 |
| Minimum | 0 |
|---|---|
| Maximum | 70.741313 |
| Zeros | 18 |
| Zeros (%) | 0.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3.2307911 |
| Q1 | 8.8666728 |
| median | 13.233888 |
| Q3 | 18.211586 |
| 95-th percentile | 27.577164 |
| Maximum | 70.741313 |
| Range | 70.741313 |
| Interquartile range (IQR) | 9.3449129 |
Descriptive statistics
| Standard deviation | 7.7478912 |
|---|---|
| Coefficient of variation (CV) | 0.55002304 |
| Kurtosis | 3.3945739 |
| Mean | 14.086485 |
| Median Absolute Deviation (MAD) | 4.5996349 |
| Skewness | 1.1688069 |
| Sum | 51415.669 |
| Variance | 60.029819 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.7071067812 | 37 | 0.9% |
| 1.414213562 | 22 | 0.5% |
| 3.535533906 | 18 | 0.4% |
| 4.949747468 | 18 | 0.4% |
| 4.242640687 | 18 | 0.4% |
| 0 | 18 | 0.4% |
| 2.121320344 | 17 | 0.4% |
| 2.828427125 | 17 | 0.4% |
| 6.363961031 | 16 | 0.4% |
| 9.192388155 | 13 | 0.3% |
| Other values (3244) | 3456 | |
| (Missing) | 540 | 12.9% |
| Value | Count | Frequency (%) |
| 0 | 18 | |
| 0.5 | 1 | < 0.1% |
| 0.5773502692 | 1 | < 0.1% |
| 0.5773502692 | 1 | < 0.1% |
| 0.5773502692 | 1 | < 0.1% |
| 0.5773502692 | 3 | 0.1% |
| 0.7071067812 | 37 | |
| 0.9831920803 | 1 | < 0.1% |
| 1 | 2 | < 0.1% |
| 1.10194633 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 70.74131278 | 1 | |
| 66.50563886 | 1 | |
| 58.39777393 | 1 | |
| 56.5714887 | 1 | |
| 53.913787 | 1 | |
| 53.03300859 | 1 | |
| 51.83338692 | 1 | |
| 50.52062285 | 1 | |
| 49.17387224 | 1 | |
| 48.49742261 | 1 |
subtokenization_indicator_sum
Real number (ℝ)
High correlation 
| Distinct | 3049 |
|---|---|
| Distinct (%) | 72.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 124.44661 |
| Minimum | 1 |
|---|---|
| Maximum | 9222.5029 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1.3333333 |
| Q1 | 6 |
| median | 22.655934 |
| Q3 | 88.32619 |
| 95-th percentile | 522.78653 |
| Maximum | 9222.5029 |
| Range | 9221.5029 |
| Interquartile range (IQR) | 82.32619 |
Descriptive statistics
| Standard deviation | 399.72429 |
|---|---|
| Coefficient of variation (CV) | 3.2120142 |
| Kurtosis | 176.86215 |
| Mean | 124.44661 |
| Median Absolute Deviation (MAD) | 20.155934 |
| Skewness | 10.723324 |
| Sum | 521431.31 |
| Variance | 159779.51 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 182 | 4.3% |
| 2 | 124 | 3.0% |
| 3 | 87 | 2.1% |
| 1.5 | 51 | 1.2% |
| 4 | 44 | 1.1% |
| 2.5 | 41 | 1.0% |
| 3.5 | 36 | 0.9% |
| 5 | 23 | 0.5% |
| 1.333333333 | 20 | 0.5% |
| 6 | 20 | 0.5% |
| Other values (3039) | 3562 |
| Value | Count | Frequency (%) |
| 1 | 182 | |
| 1.125 | 2 | < 0.1% |
| 1.142857143 | 2 | < 0.1% |
| 1.157894737 | 2 | < 0.1% |
| 1.166666667 | 3 | 0.1% |
| 1.2 | 3 | 0.1% |
| 1.25 | 8 | 0.2% |
| 1.272727273 | 1 | < 0.1% |
| 1.285714286 | 1 | < 0.1% |
| 1.333333333 | 20 | 0.5% |
| Value | Count | Frequency (%) |
| 9222.502852 | 1 | |
| 9110.311291 | 1 | |
| 6992.023549 | 1 | |
| 5652.101169 | 1 | |
| 5143.737089 | 1 | |
| 3882.088258 | 1 | |
| 3719.079594 | 1 | |
| 3710.170501 | 1 | |
| 3643.383152 | 1 | |
| 3587.347869 | 1 |
subtokenization_indicator_min
Real number (ℝ)
High correlation 
| Distinct | 100 |
|---|---|
| Distinct (%) | 2.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.2090898 |
| Minimum | 1 |
|---|---|
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1.1666667 |
| 95-th percentile | 2 |
| Maximum | 10 |
| Range | 9 |
| Interquartile range (IQR) | 0.16666667 |
Descriptive statistics
| Standard deviation | 0.53483173 |
|---|---|
| Coefficient of variation (CV) | 0.44234245 |
| Kurtosis | 49.469769 |
| Mean | 1.2090898 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.4195829 |
| Sum | 5066.0861 |
| Variance | 0.28604498 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 3099 | |
| 1.5 | 176 | 4.2% |
| 2 | 154 | 3.7% |
| 1.333333333 | 128 | 3.1% |
| 1.25 | 80 | 1.9% |
| 1.666666667 | 56 | 1.3% |
| 3 | 51 | 1.2% |
| 1.75 | 38 | 0.9% |
| 1.2 | 34 | 0.8% |
| 1.6 | 28 | 0.7% |
| Other values (90) | 346 | 8.3% |
| Value | Count | Frequency (%) |
| 1 | 3099 | |
| 1.058823529 | 1 | < 0.1% |
| 1.071428571 | 1 | < 0.1% |
| 1.076923077 | 1 | < 0.1% |
| 1.083333333 | 1 | < 0.1% |
| 1.090909091 | 2 | < 0.1% |
| 1.1 | 1 | < 0.1% |
| 1.111111111 | 2 | < 0.1% |
| 1.125 | 10 | 0.2% |
| 1.142857143 | 13 | 0.3% |
| Value | Count | Frequency (%) |
| 10 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 7 | 3 | |
| 6.5 | 1 | < 0.1% |
| 6.307692308 | 1 | < 0.1% |
| 6 | 3 | |
| 5.4 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 4.666666667 | 1 | < 0.1% |
| 4.333333333 | 1 | < 0.1% |
subtokenization_indicator_mean
Real number (ℝ)
High correlation 
| Distinct | 3056 |
|---|---|
| Distinct (%) | 72.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.8574769 |
| Minimum | 1 |
|---|---|
| Maximum | 10.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1.5 |
| median | 1.7524351 |
| Q3 | 2.0586465 |
| 95-th percentile | 2.9238435 |
| Maximum | 10.5 |
| Range | 9.5 |
| Interquartile range (IQR) | 0.55864652 |
Descriptive statistics
| Standard deviation | 0.65080889 |
|---|---|
| Coefficient of variation (CV) | 0.35037254 |
| Kurtosis | 27.167124 |
| Mean | 1.8574769 |
| Median Absolute Deviation (MAD) | 0.26732972 |
| Skewness | 3.5414724 |
| Sum | 7782.8281 |
| Variance | 0.42355222 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 257 | 6.1% |
| 1.5 | 104 | 2.5% |
| 2 | 103 | 2.5% |
| 3 | 46 | 1.1% |
| 1.25 | 39 | 0.9% |
| 1.75 | 38 | 0.9% |
| 1.333333333 | 38 | 0.9% |
| 2.5 | 32 | 0.8% |
| 1.666666667 | 29 | 0.7% |
| 2.333333333 | 16 | 0.4% |
| Other values (3046) | 3488 |
| Value | Count | Frequency (%) |
| 1 | 257 | |
| 1.0125 | 1 | < 0.1% |
| 1.047619048 | 1 | < 0.1% |
| 1.055555556 | 1 | < 0.1% |
| 1.058080808 | 1 | < 0.1% |
| 1.060606061 | 1 | < 0.1% |
| 1.066666667 | 1 | < 0.1% |
| 1.071428571 | 1 | < 0.1% |
| 1.075 | 1 | < 0.1% |
| 1.083333333 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 10.5 | 1 | < 0.1% |
| 9.2 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| 8.166666667 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 7.944444444 | 1 | < 0.1% |
| 7 | 3 | |
| 6.5 | 2 | |
| 6.307692308 | 1 | < 0.1% |
| 6 | 3 |
subtokenization_indicator_median
Real number (ℝ)
High correlation 
| Distinct | 405 |
|---|---|
| Distinct (%) | 9.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.706521 |
| Minimum | 1 |
|---|---|
| Maximum | 10.5 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1.3333333 |
| median | 1.5833333 |
| Q3 | 2 |
| 95-th percentile | 2.7652778 |
| Maximum | 10.5 |
| Range | 9.5 |
| Interquartile range (IQR) | 0.66666667 |
Descriptive statistics
| Standard deviation | 0.63635088 |
|---|---|
| Coefficient of variation (CV) | 0.37289367 |
| Kurtosis | 29.009321 |
| Mean | 1.706521 |
| Median Absolute Deviation (MAD) | 0.25 |
| Skewness | 3.7469941 |
| Sum | 7150.3229 |
| Variance | 0.40494244 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.5 | 649 | 15.5% |
| 1 | 455 | 10.9% |
| 2 | 439 | 10.5% |
| 1.666666667 | 257 | 6.1% |
| 1.333333333 | 223 | 5.3% |
| 1.75 | 177 | 4.2% |
| 1.25 | 124 | 3.0% |
| 2.5 | 83 | 2.0% |
| 1.6 | 81 | 1.9% |
| 3 | 79 | 1.9% |
| Other values (395) | 1623 |
| Value | Count | Frequency (%) |
| 1 | 455 | |
| 1.038461538 | 1 | < 0.1% |
| 1.045454545 | 1 | < 0.1% |
| 1.05 | 1 | < 0.1% |
| 1.055555556 | 3 | 0.1% |
| 1.0625 | 1 | < 0.1% |
| 1.071428571 | 3 | 0.1% |
| 1.083333333 | 6 | 0.1% |
| 1.090909091 | 1 | < 0.1% |
| 1.1 | 10 | 0.2% |
| Value | Count | Frequency (%) |
| 10.5 | 1 | < 0.1% |
| 8.5 | 2 | < 0.1% |
| 8.166666667 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 7 | 3 | |
| 6.5 | 2 | < 0.1% |
| 6.307692308 | 1 | < 0.1% |
| 6 | 5 | |
| 5.5 | 1 | < 0.1% |
| 5.4 | 1 | < 0.1% |
subtokenization_indicator_max
Real number (ℝ)
High correlation 
| Distinct | 239 |
|---|---|
| Distinct (%) | 5.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.2938163 |
| Minimum | 1 |
|---|---|
| Maximum | 59 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2.1428571 |
| median | 3.25 |
| Q3 | 5 |
| 95-th percentile | 11 |
| Maximum | 59 |
| Range | 58 |
| Interquartile range (IQR) | 2.8571429 |
Descriptive statistics
| Standard deviation | 3.711385 |
|---|---|
| Coefficient of variation (CV) | 0.86435579 |
| Kurtosis | 22.677383 |
| Mean | 4.2938163 |
| Median Absolute Deviation (MAD) | 1.25 |
| Skewness | 3.5303829 |
| Sum | 17991.09 |
| Variance | 13.774379 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 3 | 420 | 10.0% |
| 2 | 337 | 8.0% |
| 4 | 334 | 8.0% |
| 1 | 257 | 6.1% |
| 5 | 198 | 4.7% |
| 2.5 | 193 | 4.6% |
| 3.5 | 151 | 3.6% |
| 6 | 147 | 3.5% |
| 1.5 | 132 | 3.2% |
| 7 | 102 | 2.4% |
| Other values (229) | 1919 |
| Value | Count | Frequency (%) |
| 1 | 257 | |
| 1.1 | 1 | < 0.1% |
| 1.125 | 2 | < 0.1% |
| 1.142857143 | 3 | 0.1% |
| 1.157894737 | 2 | < 0.1% |
| 1.166666667 | 4 | 0.1% |
| 1.2 | 7 | 0.2% |
| 1.25 | 16 | 0.4% |
| 1.272727273 | 2 | < 0.1% |
| 1.285714286 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 59 | 1 | < 0.1% |
| 38 | 1 | < 0.1% |
| 36 | 1 | < 0.1% |
| 33 | 1 | < 0.1% |
| 32 | 1 | < 0.1% |
| 30 | 1 | < 0.1% |
| 27 | 3 | |
| 26 | 3 | |
| 25 | 4 | |
| 24 | 3 |
subtokenization_indicator_std
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 3294 |
|---|---|
| Distinct (%) | 90.2% |
| Missing | 540 |
| Missing (%) | 12.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.80102323 |
| Minimum | 0 |
|---|---|
| Maximum | 9.8994949 |
| Zeros | 106 |
| Zeros (%) | 2.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 65.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.1767767 |
| Q1 | 0.47434693 |
| median | 0.70001689 |
| Q3 | 0.95482551 |
| 95-th percentile | 1.714059 |
| Maximum | 9.8994949 |
| Range | 9.8994949 |
| Interquartile range (IQR) | 0.48047858 |
Descriptive statistics
| Standard deviation | 0.63966077 |
|---|---|
| Coefficient of variation (CV) | 0.79855458 |
| Kurtosis | 43.937135 |
| Mean | 0.80102323 |
| Median Absolute Deviation (MAD) | 0.23795869 |
| Skewness | 4.8950393 |
| Sum | 2923.7348 |
| Variance | 0.40916591 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 106 | 2.5% |
| 0.3535533906 | 40 | 1.0% |
| 0.7071067812 | 34 | 0.8% |
| 0.1767766953 | 15 | 0.4% |
| 0.5773502692 | 10 | 0.2% |
| 1.060660172 | 10 | 0.2% |
| 0.2886751346 | 9 | 0.2% |
| 1.414213562 | 9 | 0.2% |
| 0.5 | 9 | 0.2% |
| 0.4714045208 | 7 | 0.2% |
| Other values (3284) | 3401 | |
| (Missing) | 540 | 12.9% |
| Value | Count | Frequency (%) |
| 0 | 106 | |
| 0.03535533906 | 1 | < 0.1% |
| 0.03936479108 | 1 | < 0.1% |
| 0.04040610178 | 1 | < 0.1% |
| 0.04123930494 | 1 | < 0.1% |
| 0.04614625211 | 1 | < 0.1% |
| 0.04714045208 | 1 | < 0.1% |
| 0.04714045208 | 1 | < 0.1% |
| 0.05773502692 | 2 | < 0.1% |
| 0.0589255651 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 9.899494937 | 1 | |
| 8.718328926 | 1 | |
| 8.249579114 | 1 | |
| 8.030236228 | 1 | |
| 7.653430603 | 1 | |
| 7.334150943 | 1 | |
| 6.363961031 | 1 | |
| 6.128506881 | 1 | |
| 5.880944501 | 1 | |
| 5.876741066 | 1 |
Interactions
Correlations
| GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_words_sum | HS06_count | subtokenization_indicator_max | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_min | subtokenization_indicator_std | subtokenization_indicator_sum | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GOODS_DESCRIPTION_len_chars_max | 1.000 | 0.716 | 0.520 | -0.361 | 0.842 | 0.862 | 0.963 | 0.706 | 0.504 | -0.335 | 0.804 | 0.854 | 0.789 | 0.630 | 0.202 | 0.102 | -0.357 | 0.331 | 0.788 |
| GOODS_DESCRIPTION_len_chars_mean | 0.716 | 1.000 | 0.909 | 0.165 | 0.728 | 0.501 | 0.685 | 0.929 | 0.819 | 0.147 | 0.714 | 0.486 | 0.351 | 0.323 | 0.255 | 0.231 | -0.025 | 0.169 | 0.371 |
| GOODS_DESCRIPTION_len_chars_median | 0.520 | 0.909 | 1.000 | 0.245 | 0.462 | 0.393 | 0.503 | 0.840 | 0.875 | 0.227 | 0.473 | 0.379 | 0.249 | 0.243 | 0.253 | 0.257 | 0.055 | 0.115 | 0.274 |
| GOODS_DESCRIPTION_len_chars_min | -0.361 | 0.165 | 0.245 | 1.000 | -0.182 | -0.552 | -0.380 | 0.101 | 0.193 | 0.837 | -0.166 | -0.562 | -0.641 | -0.481 | -0.033 | 0.068 | 0.461 | -0.278 | -0.623 |
| GOODS_DESCRIPTION_len_chars_std | 0.842 | 0.728 | 0.462 | -0.182 | 1.000 | 0.506 | 0.787 | 0.684 | 0.414 | -0.169 | 0.913 | 0.495 | 0.402 | 0.339 | 0.158 | 0.105 | -0.156 | 0.244 | 0.407 |
| GOODS_DESCRIPTION_len_chars_sum | 0.862 | 0.501 | 0.393 | -0.552 | 0.506 | 1.000 | 0.864 | 0.526 | 0.402 | -0.518 | 0.516 | 0.998 | 0.984 | 0.753 | 0.195 | 0.074 | -0.490 | 0.394 | 0.977 |
| GOODS_DESCRIPTION_len_words_max | 0.963 | 0.685 | 0.503 | -0.380 | 0.787 | 0.864 | 1.000 | 0.731 | 0.521 | -0.342 | 0.845 | 0.867 | 0.798 | 0.603 | 0.155 | 0.060 | -0.381 | 0.298 | 0.788 |
| GOODS_DESCRIPTION_len_words_mean | 0.706 | 0.929 | 0.840 | 0.101 | 0.684 | 0.526 | 0.731 | 1.000 | 0.886 | 0.147 | 0.750 | 0.533 | 0.390 | 0.261 | 0.121 | 0.114 | -0.100 | 0.067 | 0.386 |
| GOODS_DESCRIPTION_len_words_median | 0.504 | 0.819 | 0.875 | 0.193 | 0.414 | 0.402 | 0.521 | 0.886 | 1.000 | 0.236 | 0.458 | 0.409 | 0.273 | 0.161 | 0.101 | 0.121 | -0.021 | -0.002 | 0.271 |
| GOODS_DESCRIPTION_len_words_min | -0.335 | 0.147 | 0.227 | 0.837 | -0.169 | -0.518 | -0.342 | 0.147 | 0.236 | 1.000 | -0.180 | -0.514 | -0.601 | -0.568 | -0.187 | -0.058 | 0.368 | -0.402 | -0.612 |
| GOODS_DESCRIPTION_len_words_std | 0.804 | 0.714 | 0.473 | -0.166 | 0.913 | 0.516 | 0.845 | 0.750 | 0.458 | -0.180 | 1.000 | 0.519 | 0.414 | 0.324 | 0.134 | 0.087 | -0.149 | 0.216 | 0.413 |
| GOODS_DESCRIPTION_len_words_sum | 0.854 | 0.486 | 0.379 | -0.562 | 0.495 | 0.998 | 0.867 | 0.533 | 0.409 | -0.514 | 0.519 | 1.000 | 0.985 | 0.734 | 0.167 | 0.050 | -0.504 | 0.372 | 0.972 |
| HS06_count | 0.789 | 0.351 | 0.249 | -0.641 | 0.402 | 0.984 | 0.798 | 0.390 | 0.273 | -0.601 | 0.414 | 0.985 | 1.000 | 0.755 | 0.167 | 0.038 | -0.536 | 0.398 | 0.988 |
| subtokenization_indicator_max | 0.630 | 0.323 | 0.243 | -0.481 | 0.339 | 0.753 | 0.603 | 0.261 | 0.161 | -0.568 | 0.324 | 0.734 | 0.755 | 1.000 | 0.620 | 0.390 | -0.212 | 0.863 | 0.827 |
| subtokenization_indicator_mean | 0.202 | 0.255 | 0.253 | -0.033 | 0.158 | 0.195 | 0.155 | 0.121 | 0.101 | -0.187 | 0.134 | 0.167 | 0.167 | 0.620 | 1.000 | 0.906 | 0.400 | 0.652 | 0.303 |
| subtokenization_indicator_median | 0.102 | 0.231 | 0.257 | 0.068 | 0.105 | 0.074 | 0.060 | 0.114 | 0.121 | -0.058 | 0.087 | 0.050 | 0.038 | 0.390 | 0.906 | 1.000 | 0.495 | 0.366 | 0.166 |
| subtokenization_indicator_min | -0.357 | -0.025 | 0.055 | 0.461 | -0.156 | -0.490 | -0.381 | -0.100 | -0.021 | 0.368 | -0.149 | -0.504 | -0.536 | -0.212 | 0.400 | 0.495 | 1.000 | -0.150 | -0.453 |
| subtokenization_indicator_std | 0.331 | 0.169 | 0.115 | -0.278 | 0.244 | 0.394 | 0.298 | 0.067 | -0.002 | -0.402 | 0.216 | 0.372 | 0.398 | 0.863 | 0.652 | 0.366 | -0.150 | 1.000 | 0.487 |
| subtokenization_indicator_sum | 0.788 | 0.371 | 0.274 | -0.623 | 0.407 | 0.977 | 0.788 | 0.386 | 0.271 | -0.612 | 0.413 | 0.972 | 0.988 | 0.827 | 0.303 | 0.166 | -0.453 | 0.487 | 1.000 |
Missing values
Sample
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HS06 | |||||||||||||||||||
| 010121 | 5 | 30 | 4 | 6.0 | 6.0 | 7 | 1.224745 | 172 | 24 | 34.4 | 33.0 | 46 | 8.203658 | 6.738095 | 1.0 | 1.347619 | 1.285714 | 1.666667 | 0.251751 |
| 010130 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 16 | 16 | 16.0 | 16.0 | 16 | NaN | 1.500000 | 1.5 | 1.500000 | 1.500000 | 1.500000 | NaN |
| 010190 | 1 | 3 | 3 | 3.0 | 3.0 | 3 | NaN | 15 | 15 | 15.0 | 15.0 | 15 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN |
| 010221 | 5 | 20 | 2 | 4.0 | 3.0 | 7 | 2.345208 | 103 | 8 | 20.6 | 19.0 | 34 | 11.631853 | 7.000000 | 1.0 | 1.400000 | 1.333333 | 2.000000 | 0.434613 |
| 010229 | 2 | 8 | 3 | 4.0 | 4.0 | 5 | 1.414214 | 47 | 22 | 23.5 | 23.5 | 25 | 2.121320 | 3.333333 | 1.0 | 1.666667 | 1.666667 | 2.333333 | 0.942809 |
| 010231 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 9 | 9 | 9.0 | 9.0 | 9 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN |
| 010290 | 2 | 5 | 2 | 2.5 | 2.5 | 3 | 0.707107 | 20 | 8 | 10.0 | 10.0 | 12 | 2.828427 | 2.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | 0.000000 |
| 010310 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 16 | 16 | 16.0 | 16.0 | 16 | NaN | 2.000000 | 2.0 | 2.000000 | 2.000000 | 2.000000 | NaN |
| 010392 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 4 | 4 | 4.0 | 4.0 | 4 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN |
| 010420 | 3 | 6 | 1 | 2.0 | 1.0 | 4 | 1.732051 | 33 | 4 | 11.0 | 9.0 | 20 | 8.185353 | 3.500000 | 1.0 | 1.166667 | 1.000000 | 1.500000 | 0.288675 |
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| HS06 | |||||||||||||||||||
| 961700 | 140 | 532 | 1 | 3.800000 | 3.0 | 14 | 1.885957 | 3367 | 3 | 24.050000 | 21.0 | 88 | 12.518375 | 271.683333 | 1.0 | 1.940595 | 1.732143 | 6.333333 | 0.805809 |
| 961800 | 55 | 149 | 1 | 2.709091 | 2.0 | 6 | 1.242350 | 993 | 5 | 18.054545 | 15.0 | 46 | 8.754720 | 99.133333 | 1.0 | 1.802424 | 1.666667 | 5.000000 | 0.783828 |
| 961900 | 270 | 1312 | 1 | 4.859259 | 5.0 | 17 | 2.714858 | 7045 | 4 | 26.092593 | 27.5 | 73 | 12.376180 | 485.317482 | 1.0 | 1.797472 | 1.571429 | 5.000000 | 0.687515 |
| 962000 | 44 | 166 | 1 | 3.772727 | 3.0 | 9 | 1.951310 | 1010 | 6 | 22.954545 | 21.0 | 57 | 11.309503 | 77.407937 | 1.0 | 1.759271 | 1.550000 | 6.500000 | 0.867186 |
| 970110 | 26 | 68 | 1 | 2.615385 | 3.0 | 5 | 1.267341 | 490 | 6 | 18.846154 | 16.5 | 47 | 10.414192 | 33.833333 | 1.0 | 1.301282 | 1.000000 | 2.800000 | 0.575402 |
| 970190 | 26 | 98 | 1 | 3.769231 | 2.5 | 13 | 3.037205 | 621 | 6 | 23.884615 | 18.0 | 71 | 17.673318 | 37.526496 | 1.0 | 1.443327 | 1.138889 | 4.000000 | 0.708957 |
| 970200 | 1 | 3 | 3 | 3.000000 | 3.0 | 3 | NaN | 14 | 14 | 14.000000 | 14.0 | 14 | NaN | 1.000000 | 1.0 | 1.000000 | 1.000000 | 1.000000 | NaN |
| 970300 | 31 | 100 | 1 | 3.225806 | 2.0 | 12 | 2.261411 | 672 | 9 | 21.677419 | 17.0 | 74 | 14.246373 | 53.633333 | 1.0 | 1.730108 | 1.500000 | 4.000000 | 0.844376 |
| 970400 | 6 | 16 | 1 | 2.666667 | 2.5 | 4 | 1.211060 | 100 | 7 | 16.666667 | 14.0 | 30 | 9.025889 | 7.000000 | 1.0 | 1.166667 | 1.000000 | 1.750000 | 0.302765 |
| 970500 | 4 | 11 | 1 | 2.750000 | 2.0 | 6 | 2.217356 | 75 | 8 | 18.750000 | 15.0 | 37 | 12.632630 | 6.666667 | 1.0 | 1.666667 | 1.500000 | 2.666667 | 0.816497 |
Duplicate rows
Most frequently occurring
| HS06_count | GOODS_DESCRIPTION_len_words_sum | GOODS_DESCRIPTION_len_words_min | GOODS_DESCRIPTION_len_words_mean | GOODS_DESCRIPTION_len_words_median | GOODS_DESCRIPTION_len_words_max | GOODS_DESCRIPTION_len_words_std | GOODS_DESCRIPTION_len_chars_sum | GOODS_DESCRIPTION_len_chars_min | GOODS_DESCRIPTION_len_chars_mean | GOODS_DESCRIPTION_len_chars_median | GOODS_DESCRIPTION_len_chars_max | GOODS_DESCRIPTION_len_chars_std | subtokenization_indicator_sum | subtokenization_indicator_min | subtokenization_indicator_mean | subtokenization_indicator_median | subtokenization_indicator_max | subtokenization_indicator_std | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 10 | 10 | 10.0 | 10.0 | 10 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 21 |
| 0 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 4 | 4 | 4.0 | 4.0 | 4 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 15 |
| 17 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 11 | 11 | 11.0 | 11.0 | 11 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 14 |
| 21 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 12 | 12 | 12.0 | 12.0 | 12 | NaN | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | NaN | 11 |
| 24 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 13 | 13 | 13.0 | 13.0 | 13 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 11 |
| 1 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 5 | 5 | 5.0 | 5.0 | 5 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 9 |
| 20 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 12 | 12 | 12.0 | 12.0 | 12 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 8 |
| 5 | 1 | 1 | 1 | 1.0 | 1.0 | 1 | NaN | 7 | 7 | 7.0 | 7.0 | 7 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 6 |
| 13 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 9 | 9 | 9.0 | 9.0 | 9 | NaN | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | NaN | 6 |
| 18 | 1 | 2 | 2 | 2.0 | 2.0 | 2 | NaN | 11 | 11 | 11.0 | 11.0 | 11 | NaN | 1.5 | 1.5 | 1.5 | 1.5 | 1.5 | NaN | 6 |